Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3578] Fix upper bound in GraphGenerators.sampleLogNormal #2439

Closed
wants to merge 4 commits into from

Conversation

ankurdave
Copy link
Contributor

GraphGenerators.sampleLogNormal is supposed to return an integer strictly less than maxVal. However, it violates this guarantee. It generates its return value as follows:

var X: Double = maxVal

while (X >= maxVal) {
  val Z = rand.nextGaussian()
  X = math.exp(mu + sigma*Z)
}
math.round(X.toFloat)

When X is sampled to be close to (but less than) maxVal, then it will pass the while loop condition, but the rounded result will be equal to maxVal, which will violate the guarantee. For example, if maxVal is 5 and X is 4.9, then X < maxVal, but math.round(X.toFloat) is 5.

This PR instead rounds X before checking the loop condition, guaranteeing that the condition will hold for the return value.

@SparkQA
Copy link

SparkQA commented Sep 17, 2014

QA tests have started for PR 2439 at commit 1638598.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 18, 2014

QA tests have finished for PR 2439 at commit 1638598.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ankurdave
Copy link
Contributor Author

@rxin

@rxin
Copy link
Contributor

rxin commented Sep 19, 2014

@jegonzal you should take a look :)

@rnowling
Copy link
Contributor

@ankurdave I'd be a bit concerned about how that affects the correctness of the algorithm. Especially since this will round every value down when maybe you only want to round the edge case down. What do you think?

Also, it might be good to add this to the unit tests -- call the function a 1000 times or something and verify that it never produces a value outside the expected range.

@ankurdave
Copy link
Contributor Author

@rnowling Hmm, maybe you're right about that, since rounding down will shift the whole distribution down. It might be better to handle the edge case specially.

I'll modify the unit test to do the range check for more trials.

@SparkQA
Copy link

SparkQA commented Sep 20, 2014

QA tests have started for PR 2439 at commit 5900c22.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 20, 2014

QA tests have finished for PR 2439 at commit 5900c22.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rnowling
Copy link
Contributor

I looke at the Pregel paper but it doesn't specify and doesn't cite other
papers. I know it's a common method, though.

After some thought, I think your solution is correct. With round(), integer
values 1 through maxVal - 1 have a range of 1 but 0 and maxVal have ranges
of 0.5. With floor(), 0 will have a range of 1 and maxVal won't be
encountered.

I vote for the change. :)

On Saturday, September 20, 2014, Ankur Dave [email protected]
wrote:

@rnowling https://github.com/rnowling Hmm, maybe you're right about
that -- I'm not familiar enough with the algorithm to know whether it
specifies rounding behavior in the first place.

I think the unit test you wrote does do the range check
https://github.com/apache/spark/blob/master/graphx/src/test/scala/org/apache/spark/graphx/util/GraphGeneratorsSuite.scala#L68.
In fact that's how I found this bug in the first place: the test failed in
Jenkins in an unrelated PR.


Reply to this email directly or view it on GitHub
#2439 (comment).

em [email protected]
c 954.496.2314

This reverts commit 5900c22.
@SparkQA
Copy link

SparkQA commented Sep 22, 2014

QA tests have started for PR 2439 at commit f6655e5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 22, 2014

QA tests have finished for PR 2439 at commit f6655e5.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jegonzal
Copy link
Contributor

This looks good to me.

@asfgit asfgit closed this in f9d6220 Sep 22, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants